Text Recognition and Translation of Multi-Oriented, Multi-Language and Curved Text in Natural Scene Images
نویسنده
چکیده
This study is about text detection and recognition in natural scene images. The main focus is on the detection, recognition and eventually, translation, of multi-oriented, multi-language and curvilinear text in such images. The study attempts to provide a solution that can detect and recognise such text since current leading mobile applications such as Word Lens and Google Goggles do not support such text for translation. There are many algorithms available that can detect and recognise text, but very few consider text which is multi-oriented, multi-script or curvilinear. Text detection can be carried out using various methods including region-based and texture-base methods. Furthermore, algorithms for multi-oriented text detection are further divided into non-headline and headline based methods. Three different solutions were considered in this study, one being an algorithm developed specifically for maps in which text usually has various orientations, curvatures and sizes. Another option was a framework that performed detection and recognition simultaneously. The third option was a combination of two algorithms, one for detection and one for aligning curved and multi-oriented text. The research carried out consisted of implementing the final option and integrating these two algorithms to achieve a system that could detect and recognize multi-oriented, multi-script and curvilinear text. Tests were carried out using the proposed system by using different data sets that are publicly available for testing such systems. Results were recorded according to the criteria of time, precision and recall. In addition, the proposed system was compared to two leading applications: Word Lens and Google Goggles. The result of this study was that the resulting system was not capable of performing detection and recognition of all kinds of images and text. On the other hand, it was able to give a solution for issues that other leading applications face, such as the detection of multi-oriented and multi-script text. Results can be affected by various aspects, including the libraries and languages used to implement an algorithm. In addition, the device used and its processing power are highly related to the performance and probably also to the precision of a system. Other factors that need to be considered when implementing such a system are the datasets that will be used for testing and the experience and knowledge one has in the area.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملNatural scene text localization using edge color signature
Localizing text regions in images taken from natural scenes is one of the challenging problems dueto variations in font, size, color and orientation of text. In this paper, we introduce a new concept socalled Edge Color Signature for localizing text regions in an image. This method is able to localizeboth Farsi and English texts. In the proposed method rst a pyramid using diff...
متن کاملScene Text Detection via Holistic, Multi-Channel Prediction
Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge. However, vast majority of the existing methods detect text within local regions, typically through extracting character, word or line level candidates followed by candidate aggregation and false positive elimination, which potent...
متن کاملE2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
An end-to-end method for multi-language scene text localization, recognition and script identification is proposed. The approach is based on a set of convolutional neural nets. The method, called E2E-MLT, achieves state-of-theart performance for both joint localization and script identification in natural images and in cropped word script identification. E2E-MLT is the first published multi-lan...
متن کاملA Dataset and Evaluation Metric for Coherent Text Recognition from Scene Images
In this paper, we deal with extraction of textual information from scene images. So far, the task of Scene Text Recognition (STR) has only been focusing on recognition of isolated words and, for simplicity, it omits words which are too short. Such an approach is not suitable for further processing of the extracted text. We define a new task which aims at extracting coherent blocks of text from ...
متن کامل